Statistical language model based on a hierarchical approach: MCnv
نویسندگان
چکیده
In this paper, we propose a new language model based on dependent word sequences organized in a multi-level hierarchy. We call this model MC n, where n is the maximum number of words in a sequence and is the maximum number of levels. The originality of this model is its capacity to take into account dependent variable-length sequences for very large vocabularies. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the 8 French elementary grammatical classes. The MC n model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputted by our speech recognizer MAUD. The model has been trained on a corpus of 43 million words extracted from a French newspaper and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses resulted in an improvement of 5% in accuracy.
منابع مشابه
Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model
Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...
متن کاملIntelligent identification of vehicle’s dynamics based on local model network
This paper proposes an intelligent approach for dynamic identification of the vehicles. The proposed approach is based on the data-driven identification and uses a high-performance local model network (LMN) for estimation of the vehicle’s longitudinal velocity, lateral acceleration and yaw rate. The proposed LMN requires no pre-defined standard vehicle model and uses measurement data to identif...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Model of Iranian EFL Learners\' Cultural Identity: A Structural Equation Modeling Approach
This study aimed, firstly, to investigate the underlying components of Iranian cultural identity and, secondly, to confirm the aforementioned components via Structural Equation Modeling (SEM) analysis. In order to achieve these goals, the researchers reviewed the extensive local and international literature on language, culture and identity. Based on the literature and consultations with a grou...
متن کاملMulti-Criteria Risk-Benefit Analysis of Health Care Management
Abstract Purpose of this paper: The objectives of this paper are two folds: (1) utilizing hierarchical fuzzy technique for order preference by similarity to ideal solution (TOPSIS) approach to evaluate the most suitable RFID-based systems decision, and (2) to highlight key risks and benefits of radio frequency identification technology in healthcare industry. Design/methodology/approach: R...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001